Goto

Collaborating Authors

 asymptotic dynamic


Asymptotic Dynamics of Alternating Minimization for Non-Convex Optimization

Okajima, Koki, Takahashi, Takashi

arXiv.org Machine Learning

This study investigates the asymptotic dynamics of alternating minimization applied to optimize a bilinear non-convex function with normally distributed covariates. We employ the replica method from statistical physics in a multi-step approach to precisely trace the algorithm's evolution. Our findings indicate that the dynamics can be described effectively by a two--dimensional discrete stochastic process, where each step depends on all previous time steps, revealing a memory dependency in the procedure. The theoretical framework developed in this work is broadly applicable for the analysis of various iterative algorithms, extending beyond the scope of alternating minimization.


Analyzing Convergence in Quantum Neural Networks: Deviations from Neural Tangent Kernels

You, Xuchen, Chakrabarti, Shouvanik, Chen, Boyang, Wu, Xiaodi

arXiv.org Artificial Intelligence

A quantum neural network (QNN) is a parameterized mapping efficiently implementable on near-term Noisy Intermediate-Scale Quantum (NISQ) computers. It can be used for supervised learning when combined with classical gradient-based optimizers. Despite the existing empirical and theoretical investigations, the convergence of QNN training is not fully understood. Inspired by the success of the neural tangent kernels (NTKs) in probing into the dynamics of classical neural networks, a recent line of works proposes to study over-parameterized QNNs by examining a quantum version of tangent kernels. In this work, we study the dynamics of QNNs and show that contrary to popular belief it is qualitatively different from that of any kernel regression: due to the unitarity of quantum operations, there is a non-negligible deviation from the tangent kernel regression derived at the random initialization. As a result of the deviation, we prove the at-most sublinear convergence for QNNs with Pauli measurements, which is beyond the explanatory power of any kernel regression dynamics. We then present the actual dynamics of QNNs in the limit of over-parameterization. The new dynamics capture the change of convergence rate during training and implies that the range of measurements is crucial to the fast QNN convergence.


From Deterministic ODEs to Dynamic Structural Causal Models

Rubenstein, Paul K., Bongers, Stephan, Schoelkopf, Bernhard, Mooij, Joris M.

arXiv.org Artificial Intelligence

Structural Causal Models are widely used in causal modelling, but how they relate to other modelling tools is poorly understood. In this paper we provide a novel perspective on the relationship between Ordinary Differential Equations and Structural Causal Models. We show how, under certain conditions, the asymptotic behaviour of an Ordinary Differential Equation under non-constant interventions can be modelled using Dynamic Structural Causal Models. In contrast to earlier work, we study not only the effect of interventions on equilibrium states; rather, we model asymptotic behaviour that is dynamic under interventions that vary in time, and include as a special case the study of static equilibria.


Two Approaches to Optimal Annealing

Leen, Todd K., Schottky, Bernhard, Saad, David

Neural Information Processing Systems

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.


Two Approaches to Optimal Annealing

Leen, Todd K., Schottky, Bernhard, Saad, David

Neural Information Processing Systems

We employ both master equation and order parameter approaches to analyze the asymptotic dynamics of online learning with different learning rate annealing schedules. We examine the relations between the results obtained by the two approaches and obtain new results on the optimal decay coefficients and their dependence on the number of hidden nodes in a two layer architecture.


Two Approaches to Optimal Annealing

Leen, Todd K., Schottky, Bernhard, Saad, David

Neural Information Processing Systems

The latter studies are based on examining the Kramers Moyal expansion of the master equation for the weight space probability densities. A different approach, based on the deterministic dynamics of macroscopic quantities called order parameters, has been recently presented [6, 7]. This approach enables one to monitor the evolution of the order parameters and the system performance at all times. In this paper we examine the relation between the two approaches and contrast the results obtained for different learning rate annealing schedules in the asymptotic regime. We employ the order parameter approach to examine the dependence of the dynamics on the number of hidden nodes in a multilayer system.